Identification of equipment location in data center

ABSTRACT

Techniques are disclosed for identifying the locations of equipment in computing environments such as data centers. For example, a method of identifying a plurality of computing devices in a computing environment includes the following steps. A first temperature-indicative map of the plurality of computing devices is generated while the plurality of computing device is in a first mode. The first temperature-indicative map is overlaid with a facility map associated with the computing environment. A first one of the plurality of computing devices is placed into a second mode. The second temperature-indicative map of the plurality of computing devices is generated while the first one of the plurality of computing devices is in the second mode and the remainder of the plurality of computing devices is in the first mode. The second temperature-indicative map is overlaid with the facility map associated with the computing environment. The first temperature-indicative map is compared with the second temperature-indicative map to determine a location of the first one of the plurality of computing devices.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is related to the U.S. patent application identified byattorney docket no. YOR920070190US1, entitled “Identification ofEquipment Location in Data Center,” filed concurrently herewith, and thedisclosure of which is incorporated by reference herein.

FIELD OF THE INVENTION

The present application relates to data centers and, more particularly,to techniques for identifying locations of equipment in such datacenters.

BACKGROUND OF THE INVENTION

Currently, virtualization is an important topic in the IT (InformationTechnology) industry. The promise of virtualization is the ability tomanage multiple computing devices (e.g., servers) in order to optimize acertain metric. The most common metric is server utilization, i.e.,workload is arranged in such a way that the CPU (central processingunit) is not in contention for resources and is able to achieve maximumthroughput. For this kind of optimization, the locations of the serversare not important.

A new metric to optimize is the power consumption. There are tworeasons: (1) rising cost of energy; and (2) new servers such as computerblades consume over 30 kilowatts (kW) of power per rack. This powerconsumption exceeds the limit of the local power grid in many existingdata centers.

To save energy, non-peak workload could be consolidated into a smallernumber of servers and idle servers could be powered down. On the otherhand, to satisfy the power constraint, workload could be spread overseveral servers to keep the power and thermal demands (power capping)within the capacity of the facility. These two important optimizationscan only be done if the physical locations of the servers are known.This is not the case today. Large data centers can have hundreds tothousands of servers, installed at different times and made by differentvendors. There is no way to automatically determine the locations ofthese servers.

During the installation phase, usually it is the facility engineer whodecides where to locate the equipment based on the power and thermalrequirements. After that, the system administrator generates a logicalname for the machine. Each machine can have one or many logical namesdepending on the application. From then on, the system managementsoftware only deals with the machine names and it does not know abouttheir physical locations. That is why, in many existing data centers,the exact location of each server is not known.

Accordingly, techniques are needed for identifying locations ofequipment in such data centers.

SUMMARY OF THE INVENTION

Principles of the invention provide techniques for identifying thelocations of equipment in computing environments such as data centers.

For example, in a first aspect of the invention, a method of identifyinga location of at least one computing device in a computing environment,including a plurality of computing devices, includes the followingsteps. A first representation of temperature conditions associated withthe plurality of computing devices is obtained while the at least onecomputing device is in a first mode. The at least one computing deviceis placed into a second mode. A second representation of temperatureconditions associated with the plurality of computing devices isobtained while the at least one computing device is in the second mode.The location of the at least one computing system is determined usingthe first representation and the second representation.

The first mode may be one of a normal operating mode and an idle mode,and the second mode may be the other of the normal operating mode andthe idle mode.

The respective steps of obtaining the first and second representationsof temperature conditions associated with the plurality of computingdevices may further include using at least one thermal imaging device tocapture the first and second representations. In such an embodiment, thefirst and second representations include respective thermal images takenby the at least one thermal imaging device of temperature conditions ofheat-conducting elements attached to exhaust fans of the plurality ofcomputing devices. The heat-conducting elements may be plastic strips.When a computing device includes more than one exhaust fan, thecorresponding plastic strips may be distinguishable from one anotherbased on at least one of unique shapes and unique sizes.

The method may further include the step of iteratively cycling each ofthe computing devices from the first mode to the second mode such thatfirst and second representations can be obtained from which locations ofeach of the computing devices can be determined.

In another embodiment, the respective steps of obtaining the first andsecond representations of temperature conditions associated with theplurality of computing devices may further include using at least oneoptical imaging device to capture the first and second representations.In such case, the first and second representations include respectiveoptical images of positions of elements mounted near exhaust fans of theplurality of computing devices.

In yet another embodiment, the respective steps of obtaining the firstand second representations of temperature conditions associated with theplurality of computing devices may further include using a wirelesstransmitter/receiver arrangement to capture the first and secondrepresentations. The arrangement may be an infrared transmitter/receiverarrangement.

Given the determined location of the at least one computing device, apower management operation or a troubleshooting operation may beperformed.

In a second aspect of the invention, a method of identifying locationsof a plurality of computing devices in a computing environment includesthe following steps. A first temperature-indicative map of the pluralityof computing devices is generated while the plurality of computingdevices is in a first mode. The first temperature-indicative map isoverlaid with a facility map associated with the computing environment.A first one of the plurality of computing devices is placed into asecond mode. A second temperature-indicative map of the plurality ofcomputing devices is generated while the first one of the plurality ofcomputing devices is in the second mode and the remainder of theplurality of computing devices is in the first mode. The secondtemperature-indicative map is overlaid with the facility map associatedwith the computing environment. The first temperature-indicative map iscompared with the second temperature-indicative map to determine alocation of the first one of the plurality of computing devices.

Given the determined location of the first one of the plurality ofcomputing devices, the method further includes the steps of obtaining athermal image of another portion of the first one of the plurality ofcomputing devices, and taking at least one action with respect to thefirst one of the plurality of computing devices based on the obtainedthermal image.

These and other objects, features and advantages of the presentinvention will become apparent from the following detailed descriptionof illustrative embodiments thereof, which is to be read in connectionwith the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1A illustrates a data center in which equipment locationidentification is implemented according to an embodiment of theinvention.

FIG. 1B illustrates an exhaust vent and a plastic strip for use inaccordance with the embodiment of FIG. 1A.

FIGS. 2A and 2B illustrate temperature difference between normaloperating mode and idle mode according to an embodiment of theinvention.

FIGS. 3A and 3B illustrate motion difference between normal operatingmode and idle mode according to an embodiment of the invention.

FIG. 4 illustrates inputs and outputs of a location algorithm accordingto an embodiment of the invention.

FIG. 5 illustrates a location algorithm according to an embodiment ofthe invention.

FIG. 6 illustrates part of a facility map and data base records for usein accordance with an embodiment of the invention.

FIG. 7 illustrates the operation of the location algorithm of FIG. 5with respect to an overlay of a thermal map and facility map accordingto an embodiment of the invention.

FIG. 8 illustrates updated data base records and an example ofmanagement of servers based on power consumption according to anembodiment of the invention.

FIG. 9A illustrates a data center in which equipment locationidentification is implemented according to another embodiment of theinvention.

FIG. 9B illustrates air inlets and a plastic strip for use in accordancewith the embodiment of FIG. 9A.

FIG. 10 illustrates a data center in which equipment locationidentification is implemented according to yet another embodiment of theinvention.

FIG. 11 illustrates a computing system for executing a locationalgorithm according to an embodiment of the invention.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Illustrative embodiments of the invention provide equipment locationidentification techniques using a plurality of imaging devices which arestrategically located in the proximity of the equipment being monitored.For example, a network of cameras may be placed on a ceiling (of a datacenter) below which servers to be monitored are located. As used herein,the phrase “data center” refers to any computing environment in whichone or more computing devices are located. A data center may comprisemore than one facility. Such computing devices may be servers, butprinciples of the invention are not limited to any particular computingequipment.

It also assumed that system management software that controls theservers has the ability to sequentially put each server in and out of anidle mode (i.e., a mode wherein the server is not processing data aspart of its normal data center function).

In a thermal camera embodiment, thermal images of the equipment, or somepart of the equipment (as will be explained below), are taken before andduring the idle mode. These thermal images are used to pin-point thelocation of the equipment within the facility. The idea is that a pieceof the equipment that is in idle mode will generate less heat, and thusproduce a less pronounced thermal image, than a piece of equipment thatis processing data as part of its normal data center function (i.e.,normal operating mode).

It is to be appreciated that thermal cameras can also be used for otherpurposes such as thermal load balancing or detecting higher-than-normaloperating conditions which could be a precursor to equipment failure.

Regular optical cameras can also be used in place of the thermal camera(or even in combination therewith). In such an embodiment, the opticalcamera is positioned such that an image of a cooling fan area of theequipment can be captured before and during idle mode. Thus, the idea inthis embodiment is that if the fan of the equipment can be turned off orslowed down during idle mode, than images of the fan area can be used todetermine whether the equipment is in the idle mode or the operatingmode.

The non-invasive equipment location techniques of the invention can beused on any equipment that can be remotely controlled.

FIG. 1A illustrates one embodiment of a data center in which principlesof the invention may be implemented.

As shown, data center 100 comprises a plurality of servers 102-1 through102-4 arranged in rows with isles in between the rows. It to beunderstood that only four servers are shown for the sake of clarity,however, the invention is not limited to any particular number. That is,there may be more or less rows of servers, and each row may have morethan one server. Also, the servers need not be in a row arrangement, butrather may be in other arrangements.

The facility includes ceiling 106 and raised floor 108. As is known,cabling (to and from the servers) and cooling ducts are typicallylocated in area 107 under the raised floor.

As is known, cool air is blown in from cooling ducts below the raisedfloor (attached to the air conditioning unit or CRAC (not shown) of thedata center) and into the cold isle 110. The servers draw the cool airfrom the cold isle in through their fronts (F). Hot air exhausts fromthe backs (B) of the servers into the hot isle (109-1 and 109-2). Thehot air rises above the equipment racks and is drawn back to the airconditioning unit.

In accordance with one embodiment of the invention, a camera (111-1 and111-2) is mounted on top of each hot isle to monitor plastic stripsattached to exhaust vents of the servers. Again, it is to be understoodthat two cameras are shown for the sake of clarity, however, principlesof the invention are not limited to any particular number of cameras.Also, other types of devices can be used to indicate fan movement otherthan plastic strips.

FIG. 1B illustrates a partial view of the back of each server, includingexhaust vent 120 and plastic strip 122 attached thereto. It isunderstood that the server typically includes multiple processing unitsvertically stacked in an equipment rack configuration. As such, eachprocessing unit may have its own exhaust vent and plastic strip. Assuch, plastic strips are shorter at the top of the rack and getprogressively longer at the bottom of the rack. In this manner, thecamera can see all strips from the top. For example, the plastic stripscould come in three lengths: short, medium and long. A short strip isused on the server at the top of the rack, a medium strip on the middle,and a long strip at the bottom. If there is more than one fan perprocessing unit, the user could choose to install the plastic strip ononly one fan (preferably the main fan) or use the strips of the samelength but different widths.

Cameras 111-1 and 111-2 can be stationary or rotatable, depending on thedimension of the isle and the camera lens. A preferred embodiment is afixed camera. Two types of camera can be used: thermal and optical.

With a thermal camera, the modes of the servers (i.e., idle oroperating) can be inferred from the temperatures of the strips. Thetemperature of the strip should be at the same temperature as theexhaust air, which is hotter during normal operating mode (FIG. 2A) ascompared with idle mode (FIG. 2B). Thus, images of the strips taken bythe thermal cameras can be viewed by facility personnel for temperaturedifferences (ΔT) to determine which processing units of the servers arein idle mode and which are in normal operating mode. One or more thermalimages may be used to form a thermal map.

With an optical camera, the motion of the plastic strip is indicative ofthe mode. That is, a moving strip (FIG. 3A) indicates the fan isoperating and thus the server is in normal operating mode. A non-movingor barely moving strip (FIG. 3B) indicates the fan is not operating oroperating in a reduced capacity and thus the server is in the idle mode.This embodiment works on a machine that slows down (or stops) the fanspeed in idle mode. Thus, images of the strips taken by the opticalcameras can be viewed by facility personnel for motion differences (AMotion) to determine which processing units of the servers are in idlemode and which are in normal operating mode. One or more optical imagesmay be used to form an optical map.

Optical cameras are typically less expensive than thermal cameras.However, thermal cameras are more versatile and can be used for otherpurposes (as will be explained below).

For the remainder of the detailed description, it is assumed thatthermal cameras are used, but it is straightforward to adapt theinventive methodologies for the optical camera case.

FIG. 4 illustrates a methodology for identifying the location ofequipment in a data center, according to an embodiment of the invention.As shown, location algorithm 400 inputs machine identifiers (IDs) 402,thermal maps 404 and facility map 406, and determines location 408 ofeach machine and which zone 410 it is in.

Machine IDs are the IDs of each machine in the data center. The term“machine” depends on the data center equipment. A “machine” may refer toa server, or where each server has multiple processing units (asdescribed above), a “machine” may refer to a processing unit. In anyevent, a unique identifier is pre-assigned to each machine. The thermalmaps are the images taken by the thermal cameras (e.g., 111-1 and 111-2in FIG. 1A). The facility map has the dimensions of the room and showsthe locations of major items such as entrances, exits, air conditioningunits, isles, rows of equipment and camera locations. This facility mapalso contains information on which locations belong to which zones.

FIG. 5 illustrates the location algorithm in more detail, and refers toan example, shown in FIGS. 6 and 7. That is, given the setup shown inFIG. 6, steps 501 through 502 of FIG. 5 are performed, as detailed inFIG. 7.

Facility map 600 in FIG. 6 shows that there are seven machines installedin four racks at locations (Loc) 1, 2, 3 and 4. Locations 1 and 2 belongto zone 1. Similarly, locations 3 and 4 are in zone 2. Each zone has itsown air conditioning unit. The system administrator sets up data baserecords 601 listing the machine IDs, machine types, manufacturer namesand power consumption levels. At this point, the locations and zonenumbers are not in the data base and are not known to the systemmanagement software. Identification of such locations and zones aredetermined in accordance with the invention, and then added to the database records.

FIG. 7 shows how the location algorithm is applied to this setup. Withall machines operating in normal operating mode (501), thermal map Map0(502) identifies seven exhaust vents (503), corresponding to the sevenmachines. Each vent is represented as a shaded circle on the map. Notethat the thermal map can be generated in a straightforward manner fromthe thermal images of the plastic strips taken above the equipment, asexplained above in the context of FIGS. 1A through 2B.

By overlaying Map0 with the facility map (overlay depicted as 701), thelocations of the vents can be resolved (504). Now, individual machinecan be cycled to idle mode, starting with the first machine by settingM=1 (505 and 506). After a short delay for the exhaust temperature todrop) the thermal cameras yield another thermal map, Map1 (507), whichcan overlaid with the facility map. The difference between Map0 and Map1is the vent of machine1 (508)—depicted as the clear circle on theoverlay map. This can be translated (509) to location and zone numbers(Loc 2, Zone 1). The same process is repeated for other machines (510and 511 are used to iterate until all locations and zones aredetermined). The process then ends (512).

As depicted in FIG. 8, the data base records 601 are then updated withthe location and zone information. The system management software is nowaware that zone 1 (with CRAC 1) has machines M1, M3, M5, M6, and zone 2(with CRAC 2) contains M2, M7 and M4 (as shown in FIG. 8).Advantageously, if M1 exceeds its thermal budget, workload can beshifted to a machine that is in a different zone, such as M4.

Beside location detection, thermal camera can also be used for otherpurposes such as troubleshooting and prevention. By measuring theexhaust temperatures of the same job running at different time, it ispossible to detect any anomalies in the operating condition of themachine and forewarn users about an impending malfunction. For example,thermal camera can be used to spot-measure local temperatures forthermally balancing the data center, especially after installation ofnew equipment. For this type of application, the temperature of theinlet air on the front of the server can be measured with by thermalcamera 111-3 in FIG. 9A that takes an image of a narrow plastic strip901 running across the inlet air vents of the server, as shown in FIG.9B.

In rare cases that cameras cannot be installed, the location algorithmcan be implemented with a network of temperature sensors as shown inFIG. 10. As shown, temperature sensors 1002 are placed at the exhaustvents and/or inlet vents of the servers. Multiple sensors can share adirectional transmitter (1002-1 through 1002-4), i.e., infrared ormicrowave. Receivers (1003-1 through 1003-3) are installed on theceiling or at any high position that is in the line-of-sight with thetransmitters. By the well-known method of triangulation, the location ofthe rack with respect to the room can be measured. By using the locationalgorithm, the location of individual equipment within the rack can bedetermined.

FIG. 11 illustrates a computing system 1100 for executing a locationalgorithm according to an embodiment of the invention. One or moresoftware programs for implementing a location algorithm as describedherein may be stored in memory 1104 and executed by processor 1102.Memory 1104 may therefore be considered a processor-readable storagemedium. Processor 1102 may include one or more integrated circuits,digital signal processors, computer systems, or other types ofprocessing devices, and associated supporting circuitry, in anycombination.

Accordingly, as illustratively described herein, principles of theinvention provide many advantages. For example, principles of theinvention provide a non-invasive method to determine the location ofequipment in a data center using thermal or optical camera with verylittle user interaction. Also, principles of the invention provide amethod to infer location information by overlaying sensor data andfacility blueprints. A method to detect a pending malfunction bymeasuring exhaust temperature is also provided. Still further,principles of the invention provide a method to determine the locationof equipment by using a network of temperature sensors connected to aninfrared transmitter.

Although illustrative embodiments of the present invention have beendescribed herein with reference to the accompanying drawings, it is tobe understood that the invention is not limited to those preciseembodiments, and that various other changes and modifications may bemade by one skilled in the art without departing from the scope orspirit of the invention.

1. A method of identifying locations of a plurality of computing devicesin a computing environment, comprising the steps of: generating a firsttemperature-indicative map of the plurality of computing devices whilethe plurality of computing devices is in a first mode; overlaying thefirst temperature-indicative map with a facility map associated with thecomputing environment; placing a first one of the plurality of computingdevices into a second mode; generating a second temperature-indicativemap of the plurality of computing devices while the first one of theplurality of computing devices is in the second mode and the remainderof the plurality of computing devices is in the first mode; overlayingthe second temperature-indicative map with the facility map associatedwith the computing environment; and comparing the firsttemperature-indicative map with the second temperature-indicative map todetermine a location of the first one of the plurality of computingdevices.
 2. The method of claim 1, wherein the first mode is one of anormal operating mode and an idle mode, and the second mode is the otherof the normal operating mode and the idle mode.
 3. The method of claim1, further comprising the step of iteratively cycling each of theremaining computing devices from the first mode to the second mode suchthat first and second temperature-indicative maps can be obtained andcompared from which locations of each of the remaining computing devicescan be determined.
 4. The method of claim 1, given the determinedlocation of the first one of the plurality of computing devices, themethod further comprising the steps of obtaining a thermal image ofanother portion of the first one of the plurality of computing devices,and taking at least one action with respect to the first one of theplurality of computing devices based on the obtained thermal image. 5.The method of claim 1, wherein, given the determined location of thefirst one of the plurality of computing devices, a power managementoperation is performed.
 6. The method of claim 1, wherein, given thedetermined location of the first one of the plurality of computingdevices, a troubleshooting operation is performed.
 7. The method ofclaim 1, wherein the respective steps of generating the first and secondtemperature-indicative maps of the plurality of computing devicesfurther comprises using at least one thermal imaging device to capturethe first and second temperature-indicative maps.
 8. The method of claim7, wherein the first and second temperature-indicative maps compriserespective thermal images taken by the at least one thermal imagingdevice of temperature conditions of heat-conducting elements attached toexhaust fans of the plurality of computing devices.
 9. The method ofclaim 8, wherein the heat-conducting elements are plastic strips. 10.The method of claim 9, wherein, when a computing device comprises morethan one exhaust fan, the corresponding plastic strips aredistinguishable from one another based on at least one of unique shapesand unique sizes.
 11. The method of claim 1, wherein the firsttemperature-indicative map of the plurality of computing devices isgenerated when each of the plurality of computing devices is in thefirst mode.
 12. The method of claim 1, wherein the respective steps ofgenerating the first and second temperature-indicative maps of theplurality of computing devices further comprises using at least oneoptical imaging device to capture the first and secondtemperature-indicative maps.
 13. The method of claim 12, wherein thefirst and second temperature-indicative maps comprise respective opticalimages of positions of elements mounted near exhaust fans of theplurality of computing devices.
 14. Apparatus for identifying locationsof a plurality of computing device in a computing environmentcomprising: a memory; and at least one processor coupled to the memoryand operative to: (i) generate a first temperature-indicative map of theplurality of computing devices while the plurality of computing devicesis in a first mode; (ii) overlay the first temperature-indicative mapwith a facility map associated with the computing environment; (iii)place a first one of the plurality of computing devices into a secondmode; (iv) generate a second temperature-indicative map of the pluralityof computing devices while the first one of the plurality of computingdevices is in the second mode and the remainder of the plurality ofcomputing devices is in the first mode; (v) overlay the secondtemperature-indicative map with the facility map associated with thecomputing environment; and (vi) comparing the firsttemperature-indicative map with the second temperature-indicative map todetermine a location of the first one of the plurality of computingdevices.
 15. The apparatus of claim 14, wherein the first mode is one ofa normal operating mode and an idle mode, and the second mode is theother of the normal operating mode and the idle mode.
 16. The apparatusof claim 14, further comprising the step of iteratively cycling each ofthe remaining computing devices from the first mode to the second modesuch that first and second temperature-indicative maps can be obtainedand compared from which locations of each of the remaining computingdevices can be determined.
 17. The apparatus of claim 14, given thedetermined location of the first one of the plurality of computingdevices, the method further comprising the steps of obtaining a thermalimage of another portion of the first one of the plurality of computingdevices, and taking at least one action with respect to the first one ofthe plurality of computing devices based on the obtained thermal image.18. The apparatus of claim 14, wherein, given the determined location ofthe first one of the plurality of computing devices, a power managementoperation is performed.
 19. The apparatus of claim 14, wherein, giventhe determined location of the first one of the plurality of computingdevices, a troubleshooting operation is performed.
 20. An article ofmanufacture for identifying a plurality of computing devices in acomputing environment, comprising a computer readable storage mediumcontaining one or more programs which when executed by a computerimplement the steps of: generating a first temperature-indicative map ofthe plurality of computing devices while the plurality of computingdevices is in a first mode; overlaying the first temperature-indicativemap with a facility map associated with the computing environment;placing a first one of the plurality of computing devices into a secondmode; generating a second temperature-indicative map of the plurality ofcomputing devices while the first one of the plurality of computingdevices is in the second mode and the remainder of the plurality ofcomputing devices is in the first mode; overlaying the secondtemperature-indicative map with the facility map associated with thecomputing environment; and comparing the first temperature-indicativemap with the second temperature-indicative map to determine a locationof the first one of the plurality of computing devices.